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Abstract 

We give conditions for an 0(l/n) rate of convergence of Fisher infor- 
mation and relative entropy in the Central Limit Theorem. We use 
the theory of projections in spaces and Poincare inequalities, to 
provide a better understanding of the decrease in Fisher Information 
implied by results of Barron and Brown. We show that if the stan- 
dardized Fisher Information ever becomes finite then it converges to 
zero. 



1 Introduction 

Bounds on Shannon entropy and Fisher information have long been used in 
proofs of central limit theorems (CLTs), based on quantification of the change 
in information as a result of convolution, as in the papers of Linnik (1959), 
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Shimizu (1975), Brown (1982), Barron (1986) and Johnson (2000). Each of 
these papers have a final step involving completeness or uniform integrability 
in which a limit is taken without explicitly bounding the information distance 
from the normal distribution. 

The purpose of the present paper is to provide an explicit rate of conver- 
gence of information distances, under certain natural conditions on the ran- 
dom variables. Let Xi, X2, X„ be independent identically distributed ran- 
dom variables with mean 0, variance and density function p{x), satisfying 
Poincare conditions (relating norms of mean zero functions to norms 
of the derivative), and let 4>a^{x) be the corresponding A^(0, o"^) density. The 
relative entropy distance is -D(X) = J p{x) log {p{x)/(j)^2(x)) dx. In the case 
of random variables with differentiable densities, the Fisher information dis- 
tance is J{X) = (T'^E[{{d/dx) \ogp{X) — (d/dx) log0^2(X))^] which is related 
to the Fisher information I{X) = E[{d/dx \ogp{X)f] via J{X) = a'^I{X)-l. 
This is an norm between derivatives of log-densities, and gives a natu- 
ral measure of convergence, stronger than existing theorems, as described in 
Lemma 11.61 Note that the quantities D and J are scale-invariant, that is 
D{aX) = D{X) and J{aX) = J{X) for all non-zero a. 

Let Un = iX, + ...+ Xn)/V^ be the standardized sum of the random 
variables. We show that DiUn) < 2RD{Ui)/na'^ for all random variables 
with Poincare constant R, and that J{Un) < 2RJ{Ui)/na'^ for all random 
variables with density function satisfying a weak differentiability condition. 

The present paper builds on ideas in past work which we briefly review here. 
In examination of the Fisher information a central role is played by the score 
function p{y) = {d /dy) log p{y) = p'{y)/p{y)- The score function of the sum 
of independent random variables can be expressed in terms of the score func- 
tion of the individual random variables, via a conditional expectation, as has 
been used in demonstration of convolution inequalities for Fisher information 
and Shannon entropy (in the work of Stam (1959), Blachman (1965), and 
others) . 

In particular, if Yi and Y2 are independent and identically distributed with 
score function p then the score Jiiu) of the sum Yi + Y2 is the projection of 
{piXi) + p(F2))/2 onto the linear space of functions of Yi + Y2, so by the 
Pythagorean identity and rescaling: 
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(see Lemma ITT] for details). Hence, since Equation (jT} is positive, one de- 
duces that the Fisher Information is decreasing on the powers-of-two sub- 
sequence Sk = U2k. Equation then quantifies the drop in information 
I{Sk) — I{Sk+i)- Identification of the normal as the limiting distribution 
arises from examining the difference sequence I{Sk) — I{Sk+i). 

Papers by Shimizu (1975), Brown (1982) and Barron (1986) quantify the 
change in Fisher Information with each doubling of the sample size, deducing 
convergence to the normal distribution along the powers-of-two subsequence, 
and convergence of the whole information sequence I{Un), by subadditvity 
of nI{Un)- However, these papers only ever consider the behaviour of the 
Fisher Information for X Y + (for Z,- a small normal perturbation). 

However, in general, we can conclude that if the Fisher Information I{Sk) is 
ever finite, since it is decreasing and bounded below, this difference sequence 
tends to zero. Thus the interest is in random variables Yi, Y2 with score 
functions for which Equation (Q) is small. This expression measures the 
squared difference between a 'ridge function' (a function of the sum Y1+Y2) 
and an additive function (a function of the form gi{Yi) + 5'2(^2))- From 
calculus, in general, the only functions f{yi,y2) = Qiivi) + 92(1/2) that are 
both ridge and additive are the linear functions gi{yi) = ayi + b, (72(^/2) = 
(iy2 + b, with a, hi, 62 constants, that is, the functions for which the derivatives 
g'i{y) are constant and equal. 

Previous work, as in Lemma 3.1 of Brown (1982), (see also Barron (1986)) 
established: 

Lemma 1.1 For any functions f and g there exist some a,b such that: 

E{g{Y,) - aY, - 6)^ < E (/(Fi + F2) - g{Y,) - g{Y2)f , 
when Yi,Y2 are independent identically distributed normals. 

Brown takes g G L'^{4>) and considers the projection f{s) = E,{g{Yi) + 
g(Y2)\Yi + Y2 = s). For Yi,Y2 normal, the eigenfunctions of this projection 
are the Hermite polynomials, so he can use expansions in this orthogonal 
Hermite basis. 

The main technique used in the present paper will generalize Lemma fl.ll to 
a wider class of random variables Yi,Y2. For example, consider any Yi and 
Y2 IID with finite Fisher Information /. Proposition 12 . II implies that given a 
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differentiable ridge function f{yi + 112)1 with closest additive function g, then 
for a certain constant /i: 

n9\yi) - f^f < /E (/ (Fi + Y2) - g{Y,) - g{Y2)f . (2) 

Our (basis-free) proof starts with /(Yi + F2), finds its additive part with 
g{yi) = EyJivi + Y2) and recognises that g'{yi) = -EyJivi + Y2)p{Y2). A 
Cauchy-Schwarz inequahty completes the proof as detailed in Section |21 

Hence if Equation (Q) is small then p is close to a function with derivative 
close to constant in L'^{Yi,Y2). Now Poincare inequalities provide a rela- 
tionship between norms on functions and the norms on derivatives: 



Definition 1.2 Given a random variable Y, define the Poincare constant 

Ry ■ 

(where Hi{Y) is the space of absolutely continuous functions g such that 
Var g(Y) > 0, Eg{Y) = and Kg'^{Y) < 00), and the restricted Poincare 
constant Ry : 

= sup 

where H*{Y) = Hi{Y) f] {g : Eg'{Y) =0}. 

For certain Y, Ry is infinite. However, Ry is finite for the normal and 
other log-concave distributions (see for example Klaasen (1985), Chernoff 
(1981), Chen (1982), Cacoullos (1982), Borovkov and Utev (1984)). Since 
we maximise over a smaller set of functions, Ry < Ry. Further, for Z ~ 
A^(0,(j^), R*z = ct'^/'2, with g{x) = — a"^ achieving this (we can show this 
by expanding g in the Hermite basis). 

The other important definition that we shall require is that of weak differen- 
tiability, introduced in Fabian and Hannan (1977). Brown and Gajek (1990) 
and Lehmann and Casella (1998)) discuss this condition, and provide easier 
to check conditions under which it will hold. 



Definition 1.3 ^ random variable Y has weakly differentiable density p if 
there exists a function p G L'^{p) such that for all f with E/(y +n)^ finite, the 
function g{u) = E,f{Y-\-u) has a derivative g'{u) equal to —E,[f{Y + u)p{Y)]. 
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This is a mild technical condition, allowing an exchange of limit and in- 
tegration. To see the relation to standard differentiability, we can take 
f{x) = I{x G [a, b]). Then g{u) = F{h — u) — F{a — u) (where F is the distri- 
bution function of F), and g'{u) = — JI^_^ piv) piu) ■ Thus, for any a, b where 
the distribution function F is differentiable, p{b) — p{a) = p{x)p{x)dx. 

Using this, one can extend the Brown inequality Lemma fl. II to hold (with a 
constant depending on I{Yi) and -RyJ for a wider class of random variables 
than just normals. Since linear score functions correspond to the family of 
normal distributions. Equations (0) and (j21) provide a means to prove the 
following Central Limit Theorems: 

Theorem 1.4 Given Xi,X2, . . . IID and with finite variance a"^, define the 
normalized sum Un = (S"=i Xi) j \J na'^ . 

IfXj are weakly differentiable with finite restricted Poincare constant R* then 

J{Un) < '^J{X) for alln. 

If Xi have finite Poincare constant R, then 

2 p 

D{Un) < ^D{X) for all n. 

Proof See Sections E] and El for the proof of the Fisher information bound. 
Notice that for X normal, 2R* = cx^, so the 'closer to normal X is', the closer 
the bound becomes to J{X)/n. 

The relative entropy bound is a corollary. Using an integral form of the de 
Bruijn identity (Lemma 1 of Barron (1986)), the relative entropy D{X) can 
be expressed as an integral of J{Vl^X + VtZ) (that is, a linear combina- 
tion of X and a standard normal Z). Now, if X has finite Poincare constant 
R, then for each t, the (a/I — tX + \/iZ) itself has Poincare constant < R, 
so by Theorem 11.41 the result follows. □ 

This 0{l/n) rate of convergence is perhaps to be expected. For example if 
Xi is exponentially distributed, and hence Un has a r(n) distribution, then 
J{Un) = 2/(n — 2), consistent with this. In fact, by extending the Cramer- 
Rao lower bound we deduce that 
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Lemma 1.5 IfEX^ is finite, then 

liminf nJ{Un) > 

n— >oo 

where s is the skewness, m^{X) / m2{XY/'^ (writing mr{X) for the centred 
rth moment of X). 

Proof For any function f, the positivity of E{pu{U) + f{U)y implies that 
Epf/([/)^ > E(2/'(f/) — f{Uy), giving a whole family of bounds. Assume 
that EU = and take f{u) = {u - u^m^iU) /m4{U))/m2{U). Then J{U) = 
m2{U)Epu{Uf - 1 > ms{Uf/{m2iU)m4{U)). Now, since m2(?7„) = m2{X), 
TTT'siUn) = Tnz{X)/^/n and rUiiUn) = mi{X)/n + 3m2(X)^(n — l)/n, the 
result follows. □ 

Further, this 0{l/n) convergence is consistent with estimates of Berry- 
Esseen type which give a rate of weak convergence. The following 

lemma shows the relationship between convergence in Fisher Information, 
and several weaker forms of convergence: 

Lemma 1.6 If X is a random variable with density f , and 4> is a standard 
normal, then: 

SUp|/(x)-0(x)| < + y/|j 

\f{x)-<f>{x)\dx<2dH{f,<P) < V2y/j{X), 

X 1/2 



/ 



where dnif, (p) is the Hellinger distance ^ J | f{x) — ^J'^{x)\^dx^ 

Proof The first bound comes from Shimizu (1975). The second inequality 
tightens a bound of Shimizu. Since: 



we deduce from the Poincare inequality for (p that: 
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where /i = E^//70> so 0) = 2(2 - 2^) < 4(1 - /i^). □ 

Recent work by Ball et al (2002) has also considered the rate of convergence of 
these quantities. Their paper obtains similar results, but by a very different 
method, involving transportation costs and a variational characterisation of 
Fisher Information. 

Unfortunately, Poincare constants are not finite for all distributions Y. In- 
deed, as Borovkov and Utev (1984) point out, if Ry < oo, then by consider- 
ing gn{x) = Ixl"^, we inductively deduce that all the moments of Y are finite. 
From the Berry-Esseen Theorem we know that only (2 + 6)th moment con- 
ditions are enough to ensure an explicit 0{l/n^^'^) rate of weak convergence, 
for < 6 < 1. In Section 0] we describe a proof of Fisher Information con- 
vergence under only second moment conditions, though without an explicit 
rate. This is an extension of Barron's Lemma 2, which only holds for random 
variables with a normal perturbation. 

Theorem 1.7 Given Xi,X2, ... weakly diff'erentiable IID with finite vari- 
ance cr^, define the normalized sum Un = {Ym=i Xi)/Vna'^. If J{Um) is 
finite for some m then 

lim J{Un) = 0. 

n— >oo 

Note: This extends Lemma 2 of Barron (1986), which only holds when X 
is of the form Y + Zt-. 

2 Projection of functions in 

Although the main application of the following Proposition will concern score 
functions, we present it as an abstract result concerning projection of func- 
tions in L'^{Yi,Y2). 

Proposition 2.1 Consider independent random variables Yi, Y2 with weakly 
differentiate densities and functions f,hi,h2 such that K[{f{Yi + 1^2))^] is 
finite and ]E/(Fi + Y2) = 0. We find functions gi,g2 and a constant /x such 
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that for any (5 e [0, 1]; 

E(/(Fi + y2)-/ii(n)-/i2(i^2))' 

+ (/3E {g[{Y,) - /x)' + (1 - m {g',{Y,) - i^f) , 
where T = (1 - P)I{Yi) + f3I(Y2). 

Proof Firstly, given E (/(Yi + Y2) — hi{Yi) — h2(Y2) f, we can replace hi 
and /i2 by functions gi, g2 which reduce it even further. This follows since this 
distance is minimised over choices of hi, /i2 by considering the projections, 
which remove the additive part: 

gi{u) = EyJ{u + Y2), 
g2{v) = EyJ{Yi + v), 

so the Pythagorean relation tells us that the LHS equals 

E{giiYi)-hiiYi))^+E{g2iY2)-h2iY2))'+E{f{Yi + Y2) - giiYi) - g2iY2)f . 

Having removed the additive part of /, we hope that what remains will be 
small in magnitude. Hence, the inner product of what remains and certain 
functions of the variables should be small. Specifically we define 

ri{u) = Ey,[{f{u + Y2)-gi{u)-g2iY2))p2{Y2)], 
r2{v) = EYA{f{Yi + v)-gi{Yi)-g2{v))pi{Yi)], 

and show that we can control their norms. Indeed, by Cauchy-Schwarz, for 
any u: 

rliu) < Ey, {f{u + Y2) gi{u) - ^72(^2))' Ep^(y2), 
so taking expectations over Yi, we deduce that 

Erl{Yi) <E{f{Yi + Y2) - gi{Yi) - g2{Y2)f I{Y2). (3) 

Similarly, 

Erl{Y2) < E {f{Yi + Y2) - gi{Yi) - ^2(^2))' I{Yi). (4) 

The assumption that Ef{Yi + Y'2)^ is finite implies that for almost every u 
we have Ef{u + 12)^ finite. Consider any such u. The weak differentiability 



3 RATE OF CONVERGENCE 



9 



of p2 (the density of Y2) gives that the function gi{u) = ]E[/(m + Y2)] has 
derivative gi{u) = —Kf{u + ^2)^2(^2)- Also weak differentiabihty trivially 
yields Ep2(^2) = 0, so setting fi = — IE5'2(^2)P2(^2)5 we recognize that ri{u) 
defined above simplifies to 

Using the similar expression for r2(f ) = — (fi'2(^) ^ A*)' ^^^1 adding /3 times 
Equation Q to (1 — (3) times Equation (jH), we deduce the result. □ 

Note: this inequality holds in general, for any weakly differentiable Yi,Y2 
with finite Fisher Information, whereas previous such expressions have only 
held in the case of Yi ^ Ui + Zr, for some Ui. 

Note: this inequality allows for independent random variables that are not 
identically distributed. Armed with it, one may provide Central Limit The- 
orems giving information convergence to the normal for random variables 
satisfying a uniform Lindeberg-type condition (see also Johnson (2000)). In 
certain cases we can provide a rate of convergence. 

Note: we can produce a similar expression using a similar method for finite- 
dimensional random vectors Yi,Y2. Weak differentiability can be defined 
in this case, and pi = {d/dxi){\ogp{x)) will usually be the ith component 
of the score vector function p. Similar analysis in this case can lead to an 
alternative proof of the Theorems in Johnson and Suhov (2001). 

3 Rate of convergence 

If Yi, Y2 have finite restricted Poincare constants i?*, R2 then we can extend 
Lemma ll.ll from the case of normal Yi,Y2 to more general distributions, 
providing an explicit exponential rate of convergence of Fisher Information. 
We can apply Proposition 12.11 because the score functions of sums can be 
expressed as projections. 

Lemma 3.1 Let S = Yi + Y2 where Yi and Y2 are independent and Y2 is 
weakly differentiable with score function p2- Then S is weakly differentiable 
with score function 

p{s)=E[p2{Y2)\S = s]. 
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Hence for independent weakly dijferentiable random variables Yi and Y2, with 
score functions pi and p2, writing p for the score function of S 



V2 

Proof For any square integrable test function T{u + S), define T2{v) = 
E[T{v + Yi)] so that T2(M + r2) = ^T{u + S)\Y2\. Note that E[(T2(m + F2))^] < 
E[T^(n + S)] < 00, so that weak differentabihty can be apphed to it. That 
is we have E[T{u + 5)p(5)] = E[T{u + S)p2iY2)] = E[T2iu + Y2)p2iY2)] = 
-{d/du)ET2{u + Y2) = -{d/du)ET{u + S). 

Then if both random variables are weakly differentiable, p = E[(pi(Yi) + 
P2(^2))/2|S' = s]. Thus by the Pythagorean identity, the result follows, since 
we can rescale: Pax{x) = px{x/a)/a and J{aX) = J{X)/a?. □ 

Proposition 3.2 Consider Yi,Y2 IID and weakly differentiable with vari- 
ance and restricted Poincare constant R* . Then 



V2 J - ^ ' \cy^ + 2R* 

Proof Without loss of generality, suppose Yi have mean and variance 
1, since we can just rescale, using R^x = a^R*x- Write J and / for the 
standardardized and non-standardized Fisher Information of Y, and J' and 
/' for the corresponding quantities for {Yi + Y2)/V2. By rescaling Lemma 
for projections g, writing p for the score function of (Fi + Y2)/\/2: 



J{Yi) - J 



Y1 + Y2 
V2 



Now, consider the projection of p into the space of additive functions, shown 
as a plane in FigureHJ where {h(Yi)+h(Y2)) / a/2 is the closest point to p on the 
line between -{Yi + Y2)/V2 and {p{Yi) + p{Y2)) / \/2, so that E{g{Y) + Yf > 
E{h{Y) + Yy. 

Further, we know that h corresponds to the value of A which minimises: 
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p{{Y, + Y,)/V2) 




+Piy2))/V2 



Figure 1: Role of projections 



Since in general E(f/ — is minimised at A = KUV/KV'^, in this case the 
minimising A = J'/ J, so h is J'/ J of the way along the line. This tells us 
that E{h{Y) + yy = {J'/JfW.{p{Y) + r)2 = J'V J. 

Overall then, we deduce that 'E{g{Y) + Y)"^ > J"^/J, and by Pythagoras, 
E {p{{Y, + Y,)/V2) - {giY,) + g{Y,))/V2f < J' - .P/J. 

Now applying Proposition 12. II to the first bracket, we can see that the factor 
of I in the denominator that Proposition 12.11 implies will actually cancel, 
simplifying the expression. 



J' - j'^/J > Eip 



Y, + YA g{Y,)+g{Y2[ 



> 



V2 

Hg[{n)-p)' 
21 



> 



V2 



Yi? + {-p 



2R*I 

_p2 



2R*I 



> 



J'' r 



'2 



J" 



2R*I 2R*J 
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since Kg[(Yi) — fi = where fi = —I', and since WYigiiYi) = —1, so rear- 
ranging, we obtain the result. 

Note that gi will be absolutely continuous, so we can apply the Poincare 
inequality. This follows since g[{u)du = — f f{s)p'{s — u)dsdu and 
Fubini's Theorem tells us that is — / /(s) p'{s — u)duds = — J f{s){p{s — 
w) — p{s — v))ds = giiw) — gi{v), as the random variables have a density 
everywhere. □ 

A more careful analysis generalises Proposition l3.2l to obtain Theorem ll.4l bv 
performing successive projections onto smaller additive spaces. For a given 
function /, define a series of functions by = /, and for m < n: 

r f Xi + . . . + Xjn \ _-rr~, n f Xi + . . . + Xm + Xm+1 

Jm 1= — -U^X™+i/m+l 



I "Ara+lJ m-l-L 1 / — 

n J \ y/n 

Further, define g{u) = y^Kf (^ ^^^ j ■ At step i, we approximate 

the function / by fi{{Xi + ... +Xi)/y/n) plus a sum of g{Xj) for j > i, which 
is the best approximation onto the linear space of such partially additive 
functions. 

Lemma 3.3 Defining the squared distance between successive projections to 
be 



then for Xi IID and weakly differentiable: 

t.>^^^n9\x)-^,f. 

nI[X) 



Proof Evaluate the function 

r(z) = E 



^ ^ - - 7^'^'^ 



x(p(Xi) + ...p(X,_i)) 
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in two different ways. Firstly, again using weak differentiability of Xj: 

riz) = - ( -7^) (9\z) -fi), 



where /x = IE/i_i = E/'. Secondly, we apply Caucliy-Schwarz to r{zy, and 
take expected values, to deduce that Er(X)^ < ti{i — Putting these 

together, the result follows. □ 

Lemma 3.4 For Xi IID, the sum of these squared distances ti is Sn = 
YJi=iU, where 



Sr 



Xi + ... + Xm\ s^g{Xi 



Proof Since = Efl-{m/n)Eg\ and since t^ = Efl-Ef^_^-{l/n)Eg^, 
we can rearrange to obtain: 

SO summing the telescoping sum, the result follows. □ 
Combining Lemma f3. 31 and Lemma f3.4[ we deduce that: 

^ E ^ifl^f^s'iX) - ^^f = ^^E(^'(X) - f,r. (5) 

Proof of Theorem II. 41 Again, assuming that X has variance 1, and writing 
J' for J{Un), and J for J(X), as before we know that E(c/(X) +Xf > j'^J 

and Sn = {pn — Yl, di^i) / ^ J'O- ~ J' / J)- Hence by Equation (jSI), we 
deduce that: 

J' {I -J' /J) > 3n > ^^E(^'(X) - 

> i!lziLE((?(X) - /iX)2 
- 2R*I{X) ' ^ ' 

in -I) 
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Thus, in general, rescaling gives: 



2R* 



J(X) 



2R* + (n - l)a2 



and the result follows. 



□ 



Note: for X discrete-valued, X + Zr has a finite Poincare constant, and 
hence this calculation of an explicit rate of convergence of J{Sn + Zr) still 
holds. Via Lemma [1.61 we know that Sn + Zr converges weakly for any r and 
hence Sn converges weakly to the standard normal. 

4 Convergence of Fisher Information 

We can still obtain convergence of the Fisher Information, though without 
such an attractive rate of convergence, if the Poincare constants are not finite. 
We will need uniform control over the tails of the Fisher Information, and 
then will bound it on the rest of the region using the projection arguments 
of Section 121 Recall that for I{X) finite, the density of X is bounded (since 



Definition 4.1 Given a function tjj, we define the following class: 

C^, = {X : EX = 0, = EX^ < oo, a'^Ep{X)H{\X\ > gR) < ij{R) for all R.} 

Lemma 4.2 For Xi,X2, . . . IID with finite variance and finite I{X), then 
Urn e for all m where ^{R) = Ep(X)2l(|X| > R) + C/R^/\ 

Proof We take the common variance to be equal to 1 and use the notation 
that p and p stand for the density and score function of a single X, and 
Pr for the density of Xi + ...X^. We know that Um has score function 
pm{u) = E ( J2i p{^i) I Um = u) / -/m, SO by the conditional version of Jensen's 
inequality 



Prr.{uf < E(p(Xi)2|f/^ = u) + {m- l)E(p(Xi)p(X2)|f/™ = u). (6) 



p{y) < j p{x)\p{x)\dx < ^nX)). 
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Consider the two terms of Equation (jH)) separately, firstly writing W for 

X2 + ■ ■ ■ Xm- 

< Ep{Xif{I{\Xi\>R,\Um\>R)+l{\X,\<R,\Un^\>R)) 



< Ep(Xi)^ (l(|Xi| >R)+ I{\W\ > R{Vm - 1))) 

< Ep{X)H{\X\ >R) + 



^2,n^, ^ o^ , IiX)im-l) 



i?2(^- 1)2 



Then for any u: 



E{p{X^)p{X,)\U^ = u) 

/mp{v)p{w)prn-2{u^/m — V — w) 



m 



p{v)p{w)dvdw 



qm{u) 



Pm-2{uVrn-x) / —{v) — {x - v)dvdx 



So the second term of Equation (jH)) is q'^{—R)—q'^{R) and we need a function 
ip' such that for all R: 

\cm<^'m) (7) 

For all m, qm{x) < a/ I{Um) < for some I . Since for any 



m: 



^2 \ 1/2 / /. \ 1/2 



< 2'/^(^J^^dv^ V2qUv)qliuV2 - v)dv 

< 2^/'^Vl (^Vl j V2q^{v)qm{uV2 - v)di 

(a similar bound will hold for g2m+i) and 
qra{u)< r \qM\dv < ( r ^^dv) ' (r q„Xv)dv\" < ^, 

Ju \Ju qm{v) J \Ju J U 

we deduce that Equation (0) holds, with ip'{R) = 2^/^I/R^/'^. Note that under 
a (2 + 5)th moment condition, we obtain ijj'iR) = C/i?(^+'^)/^. □ 
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By results of Brown (1982), we know that under a finite variance condi- 
tion, there exists e{R) such that EXH{\X\ > aR) < 0{R). If in addition, 
E|Xp+^ is finite for some 5, the Rosenthal inequality implies that E|t/„p"'"'^ 
is uniformly bounded, so we can take 9{R) = 1/R^. 

The other ingredient we require is a bound on the Poincare constant Rjj^ 
(the Poincare constant of Un conditioned on \Un\ <T). 

Lemma 4.3 // I{X) is finite then there exist R{T) and N{T) such that for 
all T, Rjj^ < R{T) for n > N{T). 

Proof Writing dn = sup^ |/n(^) ^ 0(^)1 (which tends to zero), since is 
bounded then: 



Now, for given T, take N{T) = 2min|n : (^21 + y/ljnj d„ < 0(T)/2|. 

This implies that /„(x) > 0(T)/2, for x e [-T, T] and n > N{T), so R{T) = 
2/(f){T) means 



since the LHS is always less than 1, so by Theorem 1 of Borovkov and Utev 






-T 



yfn{y)dy < R{T)fn{x), for O > x > -T 




(1984) we are done. 



□ 



Combining these two results gives: 
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Proof of Theorem 11.71 Firstly, using the projection inequalities (Proposi- 
tion there exist a function g and constants fi, v such that: 

J(t/„) - J(f/2„) 

> KiprXUn)-g{Ur.)f + ^j^Eig'iUn)-f,f 

> E(p„(f/„) - g{UrjmUr,\ < T) + -±^E{g'{Un) - fifl{\Un\ < T) 

> E{p^{Un)-g{Un))H{\Un\<T) 



2RjjJ{U^ 

- 1 + 2Rj; ~ ~ ^?mn\ < T) 

Now ;u = Ep'^„{Un) = -/(f/2n), and u = -E(p„ + ?7„)I(|?7„| > T), so 

J(f/n) < {l + 2R];j{Un)){J{Un)-J{U2n))+E{pn{Un)-l2Un-iyyi{\Un\ > T), 

and hence by Lemmas 14 . 21 and l4 . 3[ for some function C(^) such that C(^) ~^ 
as T oo: 

JiUn) < (1 + 2Rlj{Un)){J{Un) - J{U2n)) + C(T). 

For any e we can find Tq such that C(^o) ^ for all n > N{Tq), then 
(1 + 2i?^J(f/„))(J(f/„) - J([/2„) < (1 + 2/?(To)J)(J(f/„) - J(f/2„) < e forn 
sufficiently large. □ 
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